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Abstract 

The sunspot number (SSN), the total solar irradiance (TSI), a TSI reconstruction, and the 
solar flare index (SFI), are analyzed for long-range persistence (LRP). Standard Hurst analysis 
yields H w 0.9, which suggests strong LRP. However, solar activity time series are non-stationary 
due to the almost periodic 11 year smooth component, and the analysis does not give the 
correct H for the stochastic component. Better estimates are obtained by detrended fluctuations 
analysis (DFA), but estimates are biased and errors are large due to the short time records. These 
time series can be modeled as a stochastic process of the form x{t) = y(t) + Oy/y(t) w H {t), where 
y(t) is the smooth component, and wu{i) is a stationary fractional noise with Hurst exponent 
H. From ensembles of numerical solutions to the stochastic model, and application of Bayes' 
theorem, we can obtain bias and error bars on H and also a test of the hypothesis that a process 
is uncorrclated (H = 1/2). The conclusions from the present data sets are that SSN, TSI and 
TSI reconstruction almost certainly are long-range persistent, but with most probable value 
H w 0.7. The SFI process, however, is either very weakly persistent {H < 0.6) or completely 
uncorrelatcd. Some differences between stochastic properties of the TSI and its reconstruction 
indicate some error in the reconstruction scheme. 

1 Introduction 

As the Sun is the main energy source driving Earth processes, variability of Solar output has 
been at the center of scientific interest for centuries. Since Galileo's invention of the telescope, 
sunspots have been systematically counted. It has been noticed that the sunspot number (SSN) 
fluctuates in a more or less random fashion from month to month, but that their annual averages 
vary quasi-periodically with a period of approximately 11 years. Other cycles (for instance on 22 
and 80 years) are also discernible in the sunspot time series, and times of almost vanishing sunspot 
activity have been detected within the historical records. One example is the Maunder minimum 
1630-1720 when sunspots almost disappeared for nearly a century. 

Modern telescopes, and since the late 1970s also space observatories, have also facilitated record- 
ing the variations of solar-flare activity, for instance through the solar flare index (SFI) and mea- 
sured variability of total solar irradiance (TSI) and more recently the solar spectral irradiance (SSI). 
Variations of TSI obviously represent a forcing of Earth climate, while the effect of variations of 
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SSI is more subtle, since different parts of the solar spectrum affect the Earth atmosphere in dif- 
ferent ways. Other climatic influences of solar variability are strongly debated, such as the effect of 
changes in the galactic cosmic ray flux due to variations in the Sun's open magnetic flux associated 
with the solar wind. 

Much attention has been devoted to study correlation between solar activity and climate on 
centennial time scales through various reconstructions of solar activity based on theoretical models 



with sunspot data as input and by means of paleo-data on millennial scales (see review by Usoskin 



(2008)). This interest is obvious since a positive and quantitative identification of such a connection 
would allow a direct empirical estimate of climate sensitivity to solar activity as reviewed by Haigh 
( 2007] ); [gray et pUO ). 

The idea of long-range persistence in the solar records, on short as well as long times scales, is 
not new. Mandelbrot and Wallis (1969) applied rescaled-range (R/S) analysis to monthly SSN and 
found the characteristic bulge on the log(i?/ S) vs. log r curve for time scales r around the period of 
the sunspot cycle, but a common slope corresponding to a Hurst exponent of H « 0.9 in the ranges 
3 < r < 30 months and 30 < r < 100 years. Ruzmaikin et al. (1994) obtained similar results for 
the SSN, and for analysis extended to the time range 100 < r < 3000 years by using a M C proxy 
for cosmic ray flux. Since such behavior of the R/ S-cuive is reproduced by adding a sinusoidal 
oscillation to a fractional Gaussian noise with Hurst exponent H, both groups concluded that the 
stochastic component of the SSN is a strongly persistent fractional noise in the time range from 
months to millennia. Ogurtsov (2004) combined SSN and SSN reconstructions with 14 C proxies to 
obtain H in the range 0.9-1.0 in the time range 25 < r < 3000 years by means of R/S analysis and 
first and second order detrended fluctuation analysis (DFA(l) and DFA(2)) (Huang et al. 1998). 
The statistical significance of the long-memory persistence hypothesis on time scales longer than 
the sunspot period has been questioned by Oliver (1998), using the so-called scale of fluctuation 
approach. 

Another class of approaches to the sun-climate connection has focused on the detection of 
correlations and/or common statistical signatures between proxies of solar activity and climate 
signals on time scales of the sunspot period and shorter. Since the correlation between the solar 
cycle variation of TSI and climate signals like global temperature anomaly (GTA) appears to be 
quite weak, some works have drawn the attention to a possible "complexity-linking" between solar 
activity and Earth climate which is proposed to be discernible by identification of a common long- 
range memory process in the solar and climate signals, represented by a common Hurst exponent 



H (Grigolini et al, 2002, Scafetta and West 2003, 2005, 2008; Scafetta et al, 2004). This view 



was criticized by us in a recent Letter (Rypdal and Rypdal 2010a), and triggered a comment by 



Scafetta and West (2010) and a reply; Rypdal and Rypdal (2010b). Our view is that trends, like the 



sunspot cycle in solar signals and multi-decadal oscillations superposed on a rising trend in global 



temperature signals, will create spuriously high Hurst exponents. For instance (Scafetta and West 



2005) (their Figure 3B) obtain H ~ 0.95 for solar as well as global temperature signals. After such 



trends are accounted for, both solar signals and climate signals exhibit considerably lower Hurst 
exponents for the stochastic component of the signals. In the present paper we will focus on three 
proxies for solar activity, SSN, TSI, and SFI, and demonstrate that the SSN and TSI are the most 
persistent (most probable .ff-values 0.73 and 0.70, respectively) on time scales shorter than the 
sunspot period. Further, for the SFI the most probable value is 0.54, and we cannot falsify the null 
hypothesis that the stochastic component in the solar flare signals exhibit no long-range memory 
[H = 0.5) on time scales 10-1000 days. 
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2 Estimating Hurst exponents from data 



2.1 The special role of the variogram 

Let x(l),x(2),x(3), ... be a stationary process (a noise). We suppose that there exists a continuous- 
time motion X(t), with stationary increments, such that x(n) = X(n + 1) — X{n). The g'th order 
structure function S„(t) is the q'th statistical moment 

s q (t)=E[x(tn 

If the motion X{t) is statistically self-similar the structure functions are power laws 

S q (t)<xt«*\ 

and the scaling function has a linear dependence on q. 

C(?) = Hq, 

where H is the self-similarity exponent. For a self-similar motion the probability density function 
(PDF) of X(t)/a(t), where a 2 (t) = S 2 (t) oc t 2H , is the same for all t. If this PDF is Gaussian 
the process is called a fractional Brownian motion (fBm). For an important class of processes, 
called multifractal, the PDFs change shape with varying time scale. They typically have higher 
flatness (kurtosis) for small t, often converging towards Gaussian for large t. For such processes 
the structure functions are still power laws, but the scaling function is not linear, but has a convex 
shape. In this paper, we shall deal with an even wider class of processes where only the second-order 
structure function, sometimes called the variogram, is assumed to be a power-law, i.e., 

S 2 (t) = E[X{t) 2 } = E[X{1) 2 } t 2H . (1) 

We call H the Hurst exponent of the process x(t). If X(t) is a self-similar process, then H is its 
self-similarity exponent. Thus, in general we shall not require X(t) to be neither Gaussian or self- 
similar. In fact, we do not even require that the process is multifractal since we only assume that 
the second structure function is a power law. The importance of the Hurst exponent is its relation 
to correlations in the noise x(t). We include the derivation of this relationship for completeness, 
and start by writing 

2X(t)X(s)=X(t) 2 + X(s) 2 -[X(t)-X(s)} 2 . (2) 
Since X(t) has stationary increments, and using equation 0, we have that 

E[(X(t) - X(s)) 2 } = E[X(t - s) 2 ] = E [X{l) 2 ]\t - s\ 2H 

, and 

E[X(t) 2 } = E[X(l) 2 ] t 2H , E[X(s) 2 ] = E[X(l) 2 } s 2H , 
which, using equation ([2]), yields the co- variance structure 

E[X(t)X{s)\ = ^E[X(1) 2 } (t 2H + s 2H -\t- s\ 2H ^j . (3) 

Choosing X(Q) = we now have 

E[a?(l)a?(n)] = E (x(l) - X(0)) (x(n + 1) - X{ri) 
= E[X(l)X(n + l)]-E[X(l)X(n)], 
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and by using equation Qwe get 

E[x(l)x(n)] oc (n + l) 2H - 2n 2H + (n - l) 2H 

„ ^-n 2H = 2H(2H-l)n 2H - 2 . 
an z 

This means that the correlation function of x(t) has algebraic decay for all H £ (0, 1) except for 
if = 1/2, for which a;(i) is an uncorrelated noise. For H > 1/2 the integral over the correlation 
function is infinite, and this property is what is denoted as long-range persistence. Again, we 
emphasize that this property only requires a power-law dependence of the variogram, but not that 
the process is self-similar or even multifractal. 



2.2 Detrended fluctuation analysis 

As mentioned in the introduction solar signals contain distinct periodicities, the most prominent 
being the 11-year solar cycle, and these trends will distort the result of the variogram analysis. 
In the next section we will demonstrate several ways of detrending and how these tend to reduce 
the estimated Hurst exponents. Here we shall just briefly describe one systematic method, the 
detrended fluctuation analysis (DFA), which performs quite well on these data. Again we consider 
the discrete and stationary stochastic process {x(k)} and we construct the cumulative sum process 
= S!=o x (^)- Let us divide the sequence X{£) into N s segments of length At, where each 
segment is indexed by I = 1, 2, . . . , N s . Then compute an re'th order polynomial fit p? t ?, to X(i) in 
each segment and compute the variance of the detrended fluctuation in the segment, 

»=i 

The square root of the average of this variance over all the segments, 

V s i=i 

is the n'th order DFA fluctuation function. If an fBm with Hurst exponent H ~ 0.5 is superposed 
on a sinusoidal signal of comparable amplitude, ordinary variogram analysis will give an estimated 
Hurst exponent H ~ 1, while a third order DFA analysis (DFA(3)) will give 

(At) oc At H 

with H close to the the Hurst exponent for the fBm. Thus, there are good reasons to assume that 
given sufficiently long data series DFA(3) would give an accurate estimate of the Hurst exponent 
for the underlying detrended stochastic process. The main problem addressed in this paper is how 
to reduce the uncertainties in the estimates that arise from the limited data sets. But first we 
demonstrate the effect of detrending on the solar data without dealing with uncertainties. 



2.3 The effect of trends 

In Figure [T^, we have plotted the SFI, SSN and a TSI reconstruction over the last four sunspot 
cycles, and the instrumental TSI PMOD composite for the last two cycles. All data are given as 
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daily averages. The thick smooth curves are moving averages weighted by a Gaussian window with 
1 year standard deviation. In order to extract the stochastic component the smoothed signal should 
be subtracted, but the signals are still strongly non-stationary because the amplitudes of the rapid 
fluctuations are much higher at sunspot maxima than at minima. These amplitudes turn out to 
have a distinct statistical dependence on the local moving average y{t). In fact, the variance of the 
fluctuations around y(t) is roughly proportional to y(t), i.e. Y&r[x(t)\y(t) = y] oc y, as shown in 
Figure [2j 

This means that the mean- and amplitude-detrended signals 

z(t) = (x(t) - y<jt))/y/v®, (4) 

which is plotted in FigureflJ), are approximately stationary. The sunspot minima are still discernible 
in the detrended signals since flares and sunspots are absent for long time intervals around these 
minima. This implies that if statistics for the active sun is sought, it may be reasonable to eliminate 
periods of very low activity around sunspot minima from the analysis. In Figure [3^i we have plotted 
variograms of the four raw signals x(t). This yields Hurst exponents in the range 0.88 < H < 0.97 



and corresponds to the results obtained in Figure 3B of Scafetta and West ( 2005 ) . In Figure |3p 
we show results of DFA(3) analysis, which yields Hurst exponents in the range 0.55 < H < 0.67. 
Results similar to the latter is obtained by computing variograms of the detrended signal in Figure 



3 A stochastic model 



As mentioned in section 2.2 the limited length of the observed data records makes it difficult to 
compute error bars in the estimates of Hurst exponents directly from the data. Essentially the 
entire data set is used to estimate one value, and then we do not have more data to assess the 
statistical spread in this estimate. What we need to know to compute such error bars is the PDF of 
estimated values H in an imaginary ensemble of realizations of data sets of the same length as the 
observed record. Such an ensemble can be generated synthetically from a model that is assumed 
to have the the same statistical properties as the observed data, including an hypothesized value of 
H that can be varied. From the numerically generated ensembles one can construct a conditional 
probability density p(H\H) for obtaining an estimated value H for the Hurst exponent, given that 
the "true" exponent is H. Then, by means of Bayes' theorem we can obtain the conditional PDF 
p(H\H), which gives us the probability of having a "true" Hurst exponent H provided we have 
estimated a value H from the observational data. The width of p(H\H) gives us the error bar of our 
estimate. The details of this procedure will be given in the next section, but first we shall describe 
the model used to generate synthetic data, which follows naturally from equation Q. We have 
already observed that z(t) defined from this expression is an approximately stationary stochastic 
process and that variograms estimated from the data are approximate power laws, from which a 
Hurst exponent can be estimated. PDFs of the signals on different time scales turn out to have 
approximately the same shape on the different time scales, but they are not always Gaussian. We 
shall therefore model z(t) as a self-similar (in general non-Gaussian) process with self-similarity 
exponent H. By denoting this fractional noise process by wu{t), we can write 

X(t) =y(t)+<Ty/y~(j)WH(t), (5) 
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where y(t) is a 1 1-year-periodic oscillation with amplitude A, and wji(t) is a fractional non-Gaussian 
noise with unit variance. The relation between parameters a and A are easily estimated from the 
sunspot data by comparing the amplitude of the smoothed signal in Figure [T£i with the amplitude 
of the detrended noise in Figure [lb. 

The fractional noise wjiif) is generated by taking wnif) = Zjj{t + 1) — Zjj(t) where Znif) is a 
process defined by the integral 

Z H (t) = c H f K H (t,t')dZ(t'). (6) 
Here ch is a normalization constant and 

K H {t,t') = {t-t') H + -^-{-a) H + ~y\ 

The process Z(t) is chosen to be a Levy process (a process with stationary and independent in- 
crements) with zero mean and unit variance. If Z{t) is gaussian, implying that it is a Brownian 
motion, then Zg(t) is a fractional Brownian motion with self-similarity exponent H. In this case 
wnit) is a fractional Gaussian noise with Hurst exponent H . Using equation ^ together with the 
independence of increments in Z(t) we obtain the relation 



E[Z H (t) 2 ] = c 2 H J K H (t,t') 2 dt', 



and using the scaling relation Kn(at,al/) = a H 1//2 -FTf/(£, t') it follows that E[Zn{t) 2 ] oc t 2H . 

For any given di stribution^] on Z{t ) (and hence on we can simulate the process Zn{t) 



using the method of Stoev and Taqqu\ ( |2004 ) . For the Sunspot Number the estimated PDF of the 



noise wn(t) is shown in figure |6Jb). We see that it is reasonable to choose a Gaussian distribution 
for wjj(t). For the other time series the Gaussian approximation is not valid (See figures |4fb) 
and |10[b)), and here we fit the data to stable distributions using a maximum likelihood estima- 
tor. These stable distributions are then truncated at values corresponding roughly to the largest 
observed datapoint in the time series. The truncation ensures that the moments of wnit) exist 
and prevents the presence of unrealistically large events that can influence the analysis of esti- 
mated Hurst exponents. To demonstrate that our estimated distributions are reasonable we have 
compared the estimated PDFs for the fluctuations in the solar data with PDFs estimated from 
realizations of the stochastic models. These are presented in figures Kb), [6b andfToFb). 



4 A Bayesian estimate 
4.1 Uniform sample space 

We assume that the observed signal is a realization of the stochastic equation ^ where wu{t) is a 
fractional noise with the measured distribution and with Hurst exponent H £ [0, 1] . We construct 
a uniform (all values of H E I = [0, 1] are equally probable) sample space S of realizations of these 
processes. For each realization with a given H we measure (estimate) a value H. In general H ^ H. 

We can then, for instance, compute the conditional probability density p(H\H) from an ensemble 
of numerical solutions to the stochastic model where for each realization H is chosen randomly in 

In order for Z(t) to be a well-defined Levy process these distributions must be infinitely divisible. However, since 
we do not require our model to hold on time scales shorter than a day, this has no practical importance. 
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the interval /. The conditional PDF p(H\H) can also be computed directly from the ensemble or 
from Bayes' theorem: 

p(H\H) = «£Wm. (7) 
p(H) 

Here p{H) is by construction of S uniform on the unit interval / and hence p(H) = 1. However, 
p(H) is not necessarily uniform and must be computed from the ensemble. 



4.2 A non-uniform sample space 

The uniform sample space is based on a model of reality prior to any observation of H that any 
value of H 6 I is equally probable, i.e. we have no prior preference to what H should be. This 
may not be the most plausible starting point. Another is to ask the question; is the stochastic 
component in the signal long-range correlated or not? In other words, is H = 1/2 or is H ^ 1/2? 
In the uniform sample space we have of course that p(H = 1/2) = 0, so this sample space is 
not a good starting point if we want to assign a non-zero prior probability to the hypothesis that 
H = 1/2. What we can do is to construct a non- uniform sample space S(po) consisting of an 
ensemble of numerical realizations of the fractional process where a fraction po of the realizations 
have H = 1/2 and the remaining fraction 1 — po is chosen with H drawn randomly from the set 
= [0, 1/2) |J(l/2, 1]. Let us formulate the following hypothesis: 

Hypothesis %: The observed signal is described by the stochastic equation with an uncorrelated 
(H = 1/2) stochastic noise term. 



We then define an observation: 



Observation H Q : DFA(3) applied to the observed signal yields an estimated Hurst exponent 
H = H Q . 

From the constructed nonuniform ensemble we can compute the conditional probability that the 
hypotesis H is true provided the prior probability of this being true is po = p(T~L) and we have made 
the observation by DFA of the physical time series that the estimated Hurst exponent is H = H Q . 

First we must create an ensemble S with N elements. Npo elements are realizations with 
H = 1/2. We call this subensemble <Sh=i/2- The remaining N(l —po) elements are realizations with 
H drawn randomly from Ih^i/2- We call this subensemble 5f/^i/2- By doing this we have implicitly 
assumed zero probability to have H outside the unit interval, and hence that p(T~L) + pifl) = 
Po + (1 — po) = 1. Here, the symbol T-L denotes the null hypothesis, H G The conditioned 

subensemble S{H Q ) is the set of all elements for which the estimated H is in an e- neighborhood of 
the observed value H Q . Here e should be a small number, but not smaller than the measurement 
error. Let us consider the fraction p(H \H) of elements in £#=1/2 which belong to S{H ) and the 
fraction p(H \H) of 5h^i/2 which belong to S{H ). These fractions are the conditional probabilities 
that H £ (H Q — e, H Q + e) provided the hypotheses % or the hypothesis rl are true, respectively. It 
is now convenient to introduce the Bayes factor: 

pmm 
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Note that if e is small, both numerator and denominator in this ratio are proportional to e, and 
hence 1Z is independent of e. The Bayes factor is large (1Z 3> 1) if the probability of making the 
observation H = H is much larger if the hypothesis % is true than if % is true. Intuitively then, 
making the observation H Q should support the hypothesis and make p(H\H ) close to unity. On 
the other hand, we have that TZ <C 1 if the probability of H = H Q is much larger if % is true than 
if T~L is true. In that case one should intuitively expect p(H\H ) to be vanishingly small. This can 
be quantified by means of Bayes' theorem, which yields, 

p(H\H ) _ n P_m =n Po 



P {U\H ) p(H) l-po 
Since p(%|if D ) = 1 — p(T~i\H ) we can insert this in the equation above and solve for p(7i\H ): 

We observe that in addition to the Bayes factor also the prior probabilty po enters the expression, 
but the result is not extremely sensitive to the choice of po provided it is not very close to or 
1. The prior probability seeks to quantify our prior knowledge about the truth of the hypothesis 
T~L, and if that knowledge is poor, the most reasonable choice may be po = 1/2, which makes 
(1 -po)/po = 1 and 

In general, the conditional probabilities entering the Bayes factor may be computed from empirical 
data sets or from theoretical models. In our case, we are facing very limited data sets, and we 
therefore compute them from an ensemble of numerically generated realizations of the stochastic 
equation The estimated Bayes factors for the various solar proxies are presented in Table 1. 



5 Results 

Since we are making Bayesian estimates, the results depend on our prior probablities. Using the 
uniform ensemble, this probability distribution is uniform on the unit interval. Using the non- 
uniform sample space the prior probability parameter is po- 



5.1 The data 

The data records analyzed in the following are selected mainly for illustration of the power and 
limitations of the analysis methods, although they also represent interesting proxies that reflect 
different aspects of solar activity. All records analyzed have different lengths, the longest is the 
sunspot number, and the shortest is the TSI composite. It will become clear that the error bars 
in the estimates of the Hurst exponents depend on the lengths of the records, which is why the 
smallest error bars are found in the sunspot number and the largest in the TSI composite. It is 
conceivable that other existing data sets could be used to reduce these error bars, and certainly the 
near future will provide such data sets. 

The solar flare index (SFI) data have been downloaded from the NOAA National Geophysical 
Data Center (NNDGC) in Boulder, Colorado. The file contains daily data from the beginning of 
1966 to the end of 2008. There are no missing data points in this record. There is certainly a wealth 
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of other solar-flare data available, but to our knowledge this is the longest unbroken time-record of 
solar flare activity with daily resolution. 

The daily sunspot number has been downloaded from the Solar Influences Data Analysis Center 
(SIDC) in Brussels, Belgium. The file contains data from 1818, but have missing data points up 
to 1848. After this year the data record is complete, so in our analysis we have used the unbroken 
record from 1848 to 2011. 



The TSI reconstruction is based on Krivova et al. (2007) and the data file is retrieved from 



Max-Planck Institute of Solar System Research (MPS) in Lindau am Hartz, Germany. The data 
file contains daily data from 1611, but with many gaps of missing data points before 1963. For our 
analysis we therefore use the unbroken record from 1963 to the end of 2004. 



The TSI composite data record is based on Frolich et al. (2000J) and is downloaded from 



Physikalisches-Meteorologisches Observatorium Davos/World Radiation Center (PMOD/WRC), 
Switzerland, and is version d41_62_1007. The daily data set goes from January 1976 to July 2010, 
but contains 7% missing data points. The missing data dominates in the early part of the record, 
and for this reason we have discarded all data before 1983. For the data after 1983 there are only a 
small fraction of the data points missing. These missing data points have been reconstructed from 
the stochastic model equation ([5|, using a non-Gaussian white noise for Wh(t). The probability 
density function (PDF) for this noise term is estimated from the data, and the drift term y(t) is 
chosen to be the smoothed trend curve shown in figure [2} Since this smoothed trend curve is based 
on a weighted moving average y(t) is properly defined only up to two years after the start and two 
years before the end the data record, so the time series analyzed in the following goes from 1985 
to 2008. 



5.2 Uniform sample space 

Figure [4^, shows the DFA(3) fluctuation function for the observed SFI time series (lower curve) in 
a log-log plot. The estimated Hurst exponent is H ~ 0.67. For the SFI the DFA(3) estimate is 
particularly sensitive to the solar minimum periods because in these intervals there are long intervals 
with no solar flares and zero SFI, which gives rise to spurious persistence of the signal. If intervals 
where the index is zero is excluded from the analysis on all time scales, we obtain the upper curve 
in the figure, which yields H ~ 0.55. We belive this figure is more representative for the solar-flare 
statistics in the active sun. Figure [4)3 shows the PDF of the detrended and amplitude- adjusted SFI 
(shown in Figure ^p) together with the PDF of the synthetically generated SFI-signal. The dashed 
curve is a fit of a stable distribution to the estimated PDF. The lower panel Figure [4J; shows a 
realization of the synthetically generated un-detrended signal. This signal should be compared to 
the upper trace in Figure [pi. 

In Figure [5^, (crosses) we show the mean estimated H computed from ensembles of synthetic 
realizations of the SFI for 11 different values of H in the unit interval. H is estimated from 
DFA(3) applied to the synthetic x(t) (i.e. the signal with the periodic trend) and the deviation 
from the straight dashed line indicates the systematic bias of the DFA(3) method applied to such 
a signal. When the process is persistent (H > 0.5) the method performs quite well, with a slight 
overestimation of H when the process is strongly persistent (H — > 1). The circles (partly hidden 
by the crosses) are computed from DFA(3) applied to the fractional noise directly. The fact that 
the two curves fall on top of each other shows that the DFA(3) method removes the influence of 
the periodic trend. There is also a statistical spread in the estimated H indicated by the error 
bars, which has a Gaussian conditional PDF p(H\H). This statistical spread is very important in 
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our analysis, since it gives us the opportunity to use Bayes' theorem to compute the error bars on 
the observed Hurst exponent. The spread in estimated H depends on the length of the synthetic 
data record, which we have chosen equal to the observational record. As explained in the previous 
section the conditional PDF p(H\H) can be computed from the same ensemble. In Figure tap the 
conditional cumulative distribution P(H\H) = J_ oo p(H'\H)dH' is computed from the conditional 

PDF p(H\H) by means of the uniform sample space and Bayes' theorem for H = 0.55. From this 
distribution it is easy to compute the conditional mean E,(H\H Q ) (which is the best estimate for 
H given the observation H Q ) and the 95% confidence interval for this estimate. These are given 
in Table [6j and does not rule out the possibility that the SFI stochastic process is uncorrelated 
(H = 0.5). Note that K(H\H Q = 0.55) = 0.54, i.e. that the best estimate for H and the observation 
H Q are slightly different. Below, we shall see that this difference is more substantial in the sunspot 
number. 

Figures [6] and [7] show the same for the sunspot number. Here the PDFs are nearly Gaussian, 
and the 95% confidence interval is narrower than for the SFI because the observed sunspot record is 
much longer. The best estimate for H is 0.73, while H Q = 0.78, but the narrow confidence interval 
excludes the possibility that H = 0.5. 

The results for the TSI are shown in Figures [8] and [9] and for the TSI reconstruction in Figures 



10 and 11 The estimated Hurst exponent is somewhat smaller (H = 0.62) in the reconstruction 
compared to the observational data (H = 0.70). The PDF also appears more skew in the former. 
These differences suggests some systematic errors in the reconstruction procedure that has effect 
on these timescales shorter than the sunspot period. The reason for the larger confidence intervals 
for the TSI compared to the reconstruction, is that the latter time series is longer. 

The mean estimated Hurst exponents and their 95% confidence intervals are summarized in 
Table 1. Except for the SFI this analysis indicates that there is a significant persistence in all solar 
activity proxies analyzed, but their Hurst exponents are considerably lower than obtained by most 
previous authors. The largest H is found in the sunspot number, with the most probable value 
H = 0.73, and with 95% confidence H < 0.76. 

5.3 Non-uniform sample space 

From Table 1 it is tempting to conclude that if > 1/2 also for the SFI, since the H = 1/2 value 
is barely within the 95% confidence interval. However, this result is based on an assumption on 
a uniform prior probability, i.e. we assume all values of H € I is equally probable prior to the 
observation. If, however, prior to observation we assume that there is a finite probability that 



the SFI-process is uncorrelated ( po 0) on the relevant time scales, section 4.2 shows that the 
computation is different and the result will depend on p$. The conditional probability p(%\H ) is 
given by equation^, where the Bayes factor 1Z must be computed from the numerically generated 
ensemble of realizations generated for given values of pq. The curves of p{%\H ) as a function of po 



are shown for TSI and SFI in Figure 12 It appears that the conditional probability that H = 1/2 
is negligible for the TSI , unless we assume a prior probability for this very close to unity (in that 
case our prior prejudice overshadows any computation). However, for the SFI we observe that even 
an agnostic prior view (po = 0.5) will result in a higher probability after observation p(H\H ) ~ 0.8 
and that p{H\H ) > 0.8 if p > 0.5. 
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6 Discussion and conclusions 



The emphasis in this paper is on analysis method for discerning long-range memory in short data 
records with near-periodic trends in local mean and variance. The data records analyzed have been 
chosen mainly for illustration of these methods. Still, from these data we can conclude that Hurst 



exponents for all proxies analyzed are significantly smaller than obtained recently by Scafetta and 



West] (2005) and in the classical papers by Ruzmaikin et al. (1994) and Mandelbrot and Wallis 



(1969). We can also conclude that TSI and SSN time series on the one hand are considerably more 
persistent than the SFI time series, and at present it cannot be excluded that the solar flare activity 
is an uncorrelated process on time scales longer than a few months. We also have indications that 
the particular TSI reconstruction analyzed here (see Table 1) exhibits lower persistence and a skewer 
PDF of the fluctuations than the observed TSI. More accurate and longer reconstruction time series 
are emerging, and as the TSI time series are also growing longer, the methods devised here should 
be able to pinpoint differences in the stochastic properties of the observed and reconstructed signals 
that might have to be addressed by the groups working with reconstructions. 

As measurements of solar spectral irradiance (SSI) over several solar cycles become available, 
time series of irradiance in the UV-range will be interesting to analyze. Persistence in TSI and SSI 
will be interesting to compare to corresponding properties in solar wind parameters, geomagnetic 
indices and terrestrial climate parameters in order to assess the existence of possible sun-climate 
coupling (Scafetta and West, 2003 Rypdal and Rypdal, 2010a), and to identify the coupling mech- 
anisms. 

Stochastic properties of the proxy data at solar maxima could be compared with those at solar 
minima, and if a systematic difference exists, such measures of reconstructed data near the Maunder 
and Dalton minima should be compared to those away from these minima. The objective of this 
exercise would be to establish whether stochastic measures of proxy data could be used as reliable 
proxies for solar activity. 
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Figure 1: (a): Time series for Sunspot Number (SN), Solar Flare Index (SFI), Total Solar Irradiance 
(TSI) reconstruction, and the TSI PMOD composite. The smooth curves are obtained by convolving 
the time series with a gaussian kernel with a one-year standard deviation. The data are normalized 
such that the smooth components have equal maximal amplitude, i.e. the difference between the 
maximal and minimal value is the same for the four smoothed signals, (b): In this figure we have 
subtracted the smoothed signals from the original times series in order to de-trend the time series 
and then divided the detrended signals by the square root of the smoothed signals in (a). Prior to 
this amplitude adjustment, the origins for each of the signals are shifted to the minimal values of 
the smooth signals in (a). 
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Figure 2: For each of the signals x(t) in Figure [T^, the plots show the variance ~V&r[x(t)\y(t) = 
y] conditioned upon the value of the corresponding smoothed signal y(t). When the variance's 
dependence on the smooth signal of the fluctuations are well approximated by a linear function, we 
can model the standard deviation's dependence on the smooth signal by the square-root function. 





TSI reconstruction 


TSI (PMOD) 
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Solar Flare Index 




0.61 


0.71 


0.78 


0.55 


J Hp(H\H )dH 


0.62 


0.70 


0.73 


0.54 


95% confidence 


0.57 < H < 0.69 


0.62 < H < 0.81 


0.68 < H < 0.76 


0.49 < H < 0.61 


n 


7.0 x 10~ 3 


1.6 x 10~ 2 




3.8 



Table 1: 1. row: Hurst exponent H a as estimated by DFA(3) from the observed time series. 2. row: 
The most probable Hurst exponents J H p(H\H ) dH computed from the uniform sample space. 3. 
row: The 95% confidence intervals computed from the uniform sample space. 4. row: The Bayes 
factor computed from the non-uniform sample space. The computaion for the SSN is not shown 
since Bayes factor in this case is obviously extremely small. 
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Figure 3: Variograms computed from the raw (undetrended) data, (b): DFA(3) variogram (square 
of fluctuation function) computed from raw data. The values for H given in the figure is half the 
slope of the dotted lines. These lines are drawn to indicate the typical slope of the corresponding 
variograms. 
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Figure 4: (a): DFA(3) fluctuation function squared computed from the SFI raw data (circles), and 
DFA(3) fluctuation function computed from the SFI raw data, but by omitting all intervals on all 
time scales that contain data points where the SFI is zero, (b): PDF of fluctuations the detrended 
SFI data series (circles), and PDF of the corresponding data generated by the stochastic model, 
(c): A realization of a model-generated SFI time series. 
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Figure 5: (a): Ensemble mean E[H] of estimated Hurst exponent for SFI computed by the DFA(3) 
method. The crosses are computed from an ensemble of realizations generated from the stochastic 
model where the fractional noise wjj(t) used in the model has Hurst exponent H and the PDF 
given in Figure]^). The circles (partly hidden by the crosses) are computed from DFA(3) applied 
to the fractional noise directly. The Gaussian statistical spread in the estimated H is given by the 
error bars, (b): The conditional cumulative distribution P(H\H) = p(H'\H)dH' computed 

from the conditional PDF p(H\H) by means of the uniform sample space and Bayes' theorem for 
H = 0.55. 
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Figure 6: (a): DFA(3) fluctuation function squared computed from the SNN raw data (circles), 
(b): PDF of fluctuations the detrended SSN data series (circles), and PDF of the corresponding 
data generated by the stochastic model, (c): A realization of a model-generated SSN time series. 




Figure 7: Same as Figure[5j but for SSN time series. The reason for the narrower confidence interval 
in panel (b) compared to those in Figure [5] is that the SSN time series analyzed is considerably 
longer than the SFI time series. 
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Figure 8: DFA(3) fluctuation function squared computed from theTSI raw data (circles), (b): 
PDF of fluctuations the detrended TSI data series (circles), and PDF of the corresponding data 
generated by the stochastic model, (c): A realization of a model-generated TSI time series. 
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Figure 9: Same as Figure [5] and Figure [7J but for the TSI time series. The reason for the wider 
confidence interval in panel (b) compared to those in Figure [5] and [7] is that the TSI time series 
analyzed is considerably shorter than the SFI and SSN time series. 
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Figure 10: DFA(3) fluctuation function squared computed from the TSI reconstruction raw data 
(circles), (b): PDF of fluctuations the detrended TSI reconstruction data series (circles), and PDF 
of the corresponding data generated by the stochastic model, (c): A realization of a model-generated 
TSI reconstruction time series. 
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Figure 11: Same as Figure |9j but for the TSI reconstruction time series. The reason for the slightly 
narrower confidence interval in panel (b) compared that in Figure [9] is that the TSI reconstruction 
time series analyzed is somewhat longer than the TSI time series. 
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Figure 12: The conditional probability p(%\H ) versus prior probability po for (a):TSI, and (b): 
SFI, computed from the non-uniform sample space. 
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